Police Shooting

Description

Features of the Kaggle dataset are:

  • id: case id
  • name: Nomial data for name
  • date: Timestamp of the shooting case
  • manner_of_death: Victim was either shot or shot and tasered
  • armed: Type of armed weapon the victime used
  • age: Age of the victim
  • gender: Gender of victim M (male) or F (female)
  • race: Race of the victim with 6 categories: Asian, Black, White, Native, Hispanic, and Other
  • city: City where the victim was shot
  • state: State where the victim was shot
  • signs_of_mental_illness: True or False if the victim has mental illness
  • threat_level: Either the victim attacked, undetermined, or Other
  • flee: The victim flet by Foot, Car, Not fleeing, or Other
  • body_camera: Either the police had body camera on or not (True or False)
  • arms_category: Category of arm that victim had

Exploratory Data Analysis (EDA)

What do we see from our map?

What does histogram tell us?

When distributed by gender, we can see Females are much less likely to be shot and killed by police then males.

The majority of weapon that victims used in our dataset were guns, knives, toy weapons, and vehicles. Surprisingly, there were also a large number of unarmed and unknown weapons reported in our dataset.

Feature Engineering

PCA

As we see, the first 28 pca cover almost 100% variance of the data. we decided to drop the last 13 pca from 41 original one to avoid overfitting and increase model efficiency.

Machine Learning Models

Train-Test Split